Sift AI Penetration Test Results
Source: Sift AI Penetration Test Results.pdf
Pages: 11

--- Page 1 ---
Sift AI
R E D  T E A M  A S S E S S M E N T
AI Penetration Test Report
An adversarial security assessment of the Sift AI platform's LLM
and agent attack surface, conducted against the OWASP Top 10 for
Large Language Model Applications (2025): prompt injection,
sensitive-information disclosure, excessive agency, model abuse,
and agent-specific risks.
OWASP LLM Top 10 (2025)  ·  Internal red team  ·  Staging +
source review
JUNE 2026
CONFIDENTIAL
NIFTORY INC. DBA SIFT AI (“SIFT AI”)  ·  COMPANION TO THE SECURITY &
ARCHITECTURE OVERVIEW

--- Page 2 ---
What this report covers
01
Executive summary
02
Scope and methodology
03
OWASP LLM Top 10 coverage
04
Security controls verified
05
Findings and observations
06
Hardening recommendations
07
Summary
08
Limitations and next steps
Document information
Report
AI Penetration Test Report, version 1.0
Date
June 2026
Environment under test
Staging (app.staging.getsift.ai) plus white-box source review. No
production tenant data was accessed.
Standard
OWASP Top 10 for Large Language Model Applications (2025)
Assessment type
Adversarial red-team assessment of the LLM and agent attack surface
Prepared by
Sift AI Security
Classification
Confidential. Shareable with customers and prospects under NDA.

--- Page 3 ---
0 1  ·  E X E C U T I V E  S U M M A R Y
Result of the assessment
Sift AI ingests attacker-controlled content from social and messaging platforms and
runs autonomous LLM agents over it to score, triage, and reply. That makes the AI
layer a first-class attack surface. This assessment adversarially exercised the
current platform against the OWASP Top 10 for LLM Applications.
0
CRITICAL OR HIGH
SEVERITY ISSUES
10
SECURITY CONTROLS
TESTED AND HELD
2
LOWER-SEVERITY
OBSERVATIONS
Under adversarial testing the platform's controls held. Agent code execution is isolated from secrets and from other tenants;
attacker-controlled content is isolated within LLM prompts; tenant scope is bound to the authenticated session; database
queries are parameterized; and agent actions and outbound replies are authorized and gated by human review. No Critical or
High severity issue was identified. Two lower-severity observations and a set of hardening recommendations are recorded
for continued improvement.
A P P R O A C H
Validated against the live model, not only by code review
Where a control defends against indirect prompt injection, it was exercised against the
production model and confirmed to hold. In this assessment, five distinct injection techniques
were run against the live synthesis pipeline and all five were neutralized. The platform's posture
is defense-in-depth: prompt-level isolation reduces the probability of manipulation, and the
surrounding architecture (least-privilege tools, isolated execution, human-review-by-default)
bounds the impact of anything that gets through.

--- Page 4 ---
0 2  ·  S C O P E  A N D  M E T H O D O L O G Y
How the assessment was run
Testing combined black-box probing of the live staging environment with white-box source review of the exact code paths.
No production tenant data was touched; crafted payloads were used throughout, and a dedicated demo- organization was
used for ingestion tests.
Surfaces in scope
Methods
Testing approach
Testing followed a two-phase methodology adapted to an agentic AI system. A passive phase mapped the agent surface: the
tools each agent can call, the prompts that run over untrusted ingested content, and the actions an agent is permitted to
take. An active phase then exercised each surface with crafted adversarial inputs, attempted to escalate privilege or cross a
tenant boundary, and verified the control that should stop each attack. Where a control defends against prompt injection, it
was validated against the live production model rather than inferred from source. Each finding was re-tested after
remediation.
The work was primarily manual analysis, supported by tooling: crafted prompt-injection payloads, a harness that exercises
the real synthesis function against the live model, white-box review of the exact code paths, and standard web tooling for the
cross-tenant, path-traversal, and injection probes.
The SiftGPT agent tool surface: server-
side code execution, search, analytics,
configuration, and skill/file reads.
The synthesis and classification pipeline
that runs LLMs over ingested social
content.
The autonomous post-synthesis goal
agent that can draft replies and take
actions.
Sandbox-escape and secret-reachability
probes against the code-execution tool.
Indirect prompt-injection payloads
through ingested content, validated
against the live model.
Cross-tenant (IDOR), path-traversal, and
SQL-injection attempts on the tool
surface.
Authorization and output-handling review
of agent actions.
Review of the AI supply chain (model
providers, agent framework, tooling) and
the tenant scoping of the embedding
store.

--- Page 5 ---
Severity scale
Severity reflects impact and exploitability in Sift AI's multi-tenant context. A path to another tenant's data, or to
host code execution, is treated as Critical regardless of how it is reached.
SEVERITY
DEFINITION
CRITICAL
Leads to host code execution, exposure of secrets, or access to another tenant's data.
Compromises the platform or breaks tenant isolation.
HIGH
Significant unauthorized access or a reliable path toward it within a tenant, or an autonomous
action taken well outside intent, without a further barrier.
MEDIUM
A weakness whose impact is bounded by an existing control or a specific configuration;
meaningful to fix but not directly exploitable to a breach.
LOW
A hardening gap or defense-in-depth improvement with limited direct impact.

--- Page 6 ---
0 3  ·  O W A S P  L L M  T O P  1 0
Coverage across all ten categories
Every category in the OWASP Top 10 for LLM Applications (2025) was considered. The table records what was tested and
the result on the current platform.
ID
CATEGORY
RESULT
LLM01
Prompt Injection: direct and indirect, via attacker-controlled ingested
content.
TESTED · HELD
verified live
LLM02
Sensitive Information Disclosure: secrets, cross-tenant data, PII reachable
by the agent.
TESTED · HELD
LLM03
Supply Chain: vulnerable model, tooling, or dependencies.
TESTED · HELD
minimal surface
LLM04
Data & Model Poisoning: manipulating classification/taxonomy via ingested
content.
TESTED · HELD
LLM05
Improper Output Handling: model output flowing to code, SQL, or
customers unchecked.
OBSERVATION
O1
LLM06
Excessive Agency: agents acting beyond intent (code exec, auto-send,
close, assign).
TESTED · HELD
LLM07
System Prompt Leakage: extracting instructions or secrets from prompts.
TESTED · NO FINDING
LLM08
Vector & Embedding Weaknesses: embedding or RAG manipulation.
TESTED · HELD
LLM09
Misinformation: ungrounded or harmful AI-generated replies.
MITIGATED
by design; see O1
LLM10
Unbounded Consumption: denial-of-wallet or resource exhaustion.
OBSERVATION
O2

--- Page 7 ---
0 4  ·  S E C U R I T Y  C O N T R O L S  V E R I F I E D
Tested, and standing
These controls were exercised adversarially and held. They are the substance of the platform's defense against the OWASP
LLM Top 10.
Isolated code execution
The server-side code-execution tool runs in
a dedicated worker with an empty
environment: no secrets and no database
handle. Tool calls are marshalled to the main
thread over a message bridge. Escape and
secret-reachability probes did not reach host
secrets or another tenant.
Untrusted-content isolation
Attacker-controlled ingested content is
wrapped in an explicit untrusted-content
boundary in the synthesis, goal-agent, and
tagging prompts, with a directive to treat it as
data, never as instructions. A battery of five
distinct injection techniques (title hijack,
sentiment and score manipulation, system-
prompt extraction, and role override) was run
against the live model; all five were
neutralized, with the analyzer scoring the
genuine content and ignoring every
embedded instruction.
Tenant isolation
Tenant scope is bound to the authenticated
session, not to caller input. A spoofed
tenant/user parameter returned identical,
correctly-scoped results, with no cross-
tenant access.
Least-privilege agent actions
The autonomous agent holds only
read/search tools plus a single decision tool,
with no access to the code-execution
sandbox. Every action it requests is validated
server-side against the selected goal's allow-
list before execution.

--- Page 8 ---
Injection-resistant queries
Search and analytics filter values reach the
database only as parameterized binds; the
single raw identifier (sort field) is allow-
listed. SQL-injection payloads could not
break out. Path-traversal attempts were
rejected at the edge and again at the
application layer.
Governed autonomy & oversight
Outbound AI replies default to human review.
Auto-send is double opt-in (enabled at the
organization level, then promoted per goal),
supervised, confidence-gated and protected
by an org-level kill switch. Configuration
changes require an explicit preview-then-
confirm. Execution time is hard-capped.
Minimal, static tooling surface
The AI stack is deliberately small: managed
foundation models, the Mastra agent
framework, and schema-constrained
structured outputs. Agents are constructed
with a fixed, in-house tool set defined at
build time. There is no plugin marketplace, no
dynamic or third-party tool loading, and no
Model Context Protocol server, so there is no
untrusted-tool supply-chain vector.
Application dependencies are scanned in the
development pipeline.
Tenant-scoped embeddings
The only embedding store is the per-
organization tag and theme taxonomy used
for semantic classification. Embeddings are
keyed and queried by organization, so
retrieval is never cross-tenant; a manipulated
message can at most influence which of that
organization's own tags it matches, an
outcome a human reviews. Embeddings
carry no authority over access or actions.

--- Page 9 ---
0 5  ·  F I N D I N G S  A N D  O B S E R V A T I O N S
Lower-severity items on the current platform
No Critical or High severity issues were identified. The following lower-severity observations are recorded with
recommendations.
ID
OBSERVATION
OWASP
SEVERITY
STATUS
O1
No deterministic content check before auto-send
LLM05, LLM09
LOW
Hardening
O2
No rate limiting on automated AI actions
LLM10
LOW
Hardening
O1
LOW
LLM05 · LLM09 · by configuration
HARDENING
No deterministic content check before auto-send
Replies default to human review; auto-send is double opt-in (enabled at the organization level, then
promoted per goal). For an organization that has opted into auto-send, a drafted reply is delivered
without a deterministic output check (link allow-listing, PII or secret scanning, or a safety judge) ahead
of the platform-level checks. The risk is bounded by the human-review-by-default posture and the
supervised, confidence-gated, kill-switchable autonomy model, and applies only to the auto-send
configuration.
Recommendation. Add an output guardrail before send: deterministic checks (URL allow-list including the
major social platforms, PII or secret scan, prompt-echo detection) plus a second-model safety judge,
failing closed to human review.
O2
LOW
LLM10 · abuse
HARDENING
No rate limiting on automated AI actions
There are no per-tenant or per-thread caps on automated AI actions (drafts, sends, tags). Bounded
execution time limits a single run, but sustained automated activity is not throttled, which is a denial-
of-wallet or abuse consideration rather than a confidentiality issue.
Recommendation. Add per-tenant and per-thread rate limits and a circuit breaker that trips to human
review on anomalous volume or repeated failures.

--- Page 10 ---
0 6  ·  H A R D E N I N G  R E C O M M E N D A T I O N S
Continued improvement
The platform is in good standing. The following raise the bar further and address the observations above.
Durable code-execution isolate. Migrate the code-execution tool to a hardened isolate (e.g.
isolated-vm or an equivalent separate-heap runtime) as the long-term boundary, beyond the current
worker isolation.
Output guardrail before send (addresses O1): deterministic link, PII, and secret checks plus a
second-model safety judge, failing closed to human review.
Adversarial eval suite in CI. A prompt-injection and PII-extraction regression suite wired into the
release pipeline as a gate, so prompt or policy changes cannot regress these protections.
Rate limits and a circuit breaker on automated AI actions (addresses O2).
Periodic re-testing of the AI surface as the agent capabilities expand.

--- Page 11 ---
SIFT AI · AI PENETRATION TEST · CONFIDENTIAL
JUNE 2026 · CONFIDENTIAL
0 7  ·  S U M M A R Y
Assessment summary
Sift AI performs AI-specific security testing aligned to the OWASP Top 10 for LLM Applications.
A June 2026 internal red-team assessment exercised prompt injection, sensitive-information
disclosure, excessive agency, and agent authorization across the SiftGPT tool surface, the
synthesis pipeline, and the autonomous goal agent. The platform's controls held: agent code
execution is isolated from secrets and from other tenants; attacker-controlled ingested content
is isolated within LLM prompts (verified live against the production model); and tenant isolation,
path-traversal, and SQL-injection defenses held. No Critical or High severity issue was
identified. A small number of lower-severity hardening observations are tracked, with
recommendations. Further evidence is available under NDA.
This document reflects an internal assessment with a defined scope; it does not assert a third-party / external
red-team engagement. The limitations below state exactly what was and was not covered.
0 8  ·  L I M I T A T I O N S  A N D  N E X T  S T E P S
Honest scope boundaries
Internal, time-boxed assessment against staging plus source review, not a third-party engagement.
Indirect-injection controls were validated through controlled testing against the production model;
the autonomous goal-agent prompt is recommended for the same live validation as a next step.
Re-testing is recommended after the hardening recommendations land, and as a recurring release-
gated evaluation thereafter.
